17 research outputs found

    SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild

    Full text link
    We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.Comment: Accepted to CVPR 2018 (Spotlight

    Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

    Full text link
    Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics. We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction. Code and Data available at: https://measuredalbedo.github.io/Comment: Accepted into ICCP202

    NePhi\texttt{NePhi}: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration

    Full text link
    This work proposes NePhi\texttt{NePhi}, a neural deformation model which results in approximately diffeomorphic transformations. In contrast to the predominant voxel-based approaches, NePhi\texttt{NePhi} represents deformations functionally which allows for memory-efficient training and inference. This is of particular importance for large volumetric registrations. Further, while medical image registration approaches representing transformation maps via multi-layer perceptrons have been proposed, NePhi\texttt{NePhi} facilitates both pairwise optimization-based registration as well as\textit{as well as} learning-based registration via predicted or optimized global and local latent codes. Lastly, as deformation regularity is a highly desirable property for most medical image registration tasks, NePhi\texttt{NePhi} makes use of gradient inverse consistency regularization which empirically results in approximately diffeomorphic transformations. We show the performance of NePhi\texttt{NePhi} on two 2D synthetic datasets as well as on real 3D lung registration. Our results show that NePhi\texttt{NePhi} can achieve similar accuracies as voxel-based representations in a single-resolution registration setting while using less memory and allowing for faster instance-optimization

    Bringing Telepresence to Every Desk

    Full text link
    In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/

    Universal Guidance for Diffusion Models

    Full text link
    Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion

    Constraints and Priors for Inverse Rendering from Limited Observations

    Get PDF
    Inverse Rendering deals with recovering the underlying intrinsic components of an image, i.e. geometry, reflectance, illumination and the camera with which the image was captured. Inferring these intrinsic components of an image is a fundamental problem in Computer Vision. Solving Inverse Rendering unlocks a host of real world applications in Augmented and Virtual Reality, Robotics, Computational Photography, and gaming. Researchers have made significant progress in solving Inverse Rendering from a large number of images of an object or a scene under relatively constrained settings. However, most real life applications rely on a single or a small number of images captured in an unconstrained environment. Thus in this thesis, we explore Inverse Rendering under limited observations from unconstrained images. We consider two different approaches for solving Inverse Rendering under limited observations. First, we consider learning data-driven priors that can be used for Inverse Rendering from a single image. Our goal is to jointly learn all intrinsic components of an image, such that we can recombine them and train on unlabeled real data using self-supervised reconstruction loss. A key component that enables self-supervision is a differentiable rendering module that can combine the intrinsic components to accurately regenerate the image. We show how such a self-supervised reconstruction loss can be used for Inverse Rendering of faces. While this is relatively straightforward for faces, complex appearance effects (e.g. inter-reflections, cast-shadows, and near-field lighting) present in a scene can’t be captured with a differentiable rendering module. Thus we also propose a deep CNN based differentiable rendering module (Residual Appearance Renderer) that can capture these complex appearance effects and enable self-supervised learning. Another contribution is a novel Inverse Rendering architecture, SfSNet, that performs Inverse Rendering for faces and scenes. Second, we consider enforcing low-rank multi-view constraints in an optimization framework to enable Inverse Rendering from a few images. To this end, we propose a novel multi-view rank constraint that connects all cameras capturing all the images in a scene and is enforced to ensure accurate camera recovery. We also jointly enforce a low-rank constraint and remove ambiguity to perform accurate Uncalibrated Photometric Stereo from a few images. In these problems, we formulate a constrained low-rank optimization problem in the presence of noisy estimates and missing data. Our proposed optimization framework can handle this non-convex optimization using Alternate Direction Method of Multipliers (ADMM). Given a few images, enforcing low-rank constraints significantly improves Inverse Rendering

    MVPSNet: Fast Generalizable Multi-view Photometric Stereo

    Full text link
    We propose a fast and generalizable solution to Multi-view Photometric Stereo (MVPS), called MVPSNet. The key to our approach is a feature extraction network that effectively combines images from the same view captured under multiple lighting conditions to extract geometric features from shading cues for stereo matching. We demonstrate these features, termed `Light Aggregated Feature Maps' (LAFM), are effective for feature matching even in textureless regions, where traditional multi-view stereo methods fail. Our method produces similar reconstruction results to PS-NeRF, a state-of-the-art MVPS method that optimizes a neural network per-scene, while being 411Ă—\times faster (105 seconds vs. 12 hours) in inference. Additionally, we introduce a new synthetic dataset for MVPS, sMVPS, which is shown to be effective to train a generalizable MVPS method

    A dynamic neighborhood learning based particle swarm optimizer for global numerical optimization

    No full text
    The concept of particle swarms originated from the simulation of the social behavior commonly observed in animal kingdom and evolved into a very simple but efficient technique for optimization in recent past. Since its advent in 1995, the Particle Swarm Optimization (PSO) algorithm has attracted the attention of a lot of researchers all over the world resulting into a huge number of variants of the basic algorithm as well as many parameter selection/control strategies. PSO relies on the learning strategy of the individuals to guide its search direction. Traditionally, each particle utilizes its historical best experience as well as the global best experience of the whole swarm through linear summation. The Comprehensive Learning PSO (CLPSO) was proposed as a powerful variant of PSO that enhances the diversity of the population by encouraging each particle to learn from different particles on different dimensions, in the metaphor that the best particle, despite having the highest fitness, does not always offer a better value in every dimension. This paper presents a variant of single-objective PSO called Dynamic Neighborhood Learning Particle Swarm Optimizer (DNLPSO), which uses learning strategy whereby all other particles’ historical best information is used to update a particle’s velocity as in CLPSO. But in contrast to CLPSO, in DNLPSO, the exemplar particle is selected from a neighborhood. This strategy enables the learner particle to learn from the historical information of its neighborhood or sometimes from that of its own. Moreover, the neighborhoods are made dynamic in nature i.e. they are reformed after certain intervals. This helps the diversity of the swarm to be preserved in order to discourage premature convergence. Experiments were conducted on 16 numerical benchmarks in 10, 30 and 50 dimensions, a set of five constrained benchmarks and also on a practical engineering optimization problem concerning the spread-spectrum radar poly-phase code design. The results demonstrate very competitive performance of DNLPSO while locating the global optimum on complicated and multimodal fitness landscapes when compared with five other recent variants of PSO
    corecore